Time series database comprising a plurality of time series database schemas

ABSTRACT

In a computer-implemented method for operating on a time series database including a plurality of time series database schemas, a query of a time series database is received, wherein the time series database includes a plurality of time series database schemas, and wherein each received data point is stored according to each time series database schema of the plurality of time series database schemas, such that the time series database comprises multiple instances of each data point. A query plan is generated according to the query and the plurality of time series database schemas corresponding to a time range.

RELATED APPLICATION

This application is a continuation application of and claims priority toand the benefit of co-pending U.S. application Ser. No. 16/517,353,filed on Jul. 19, 2019, entitled “A TIME SERIES DATABASE COMPRISING APLURALITY OF TIME SERIES DATABASE SCHEMAS,” by Clement Pang, havingAttorney Docket No. F386.04, and assigned to the assignee of the presentapplication.

BACKGROUND

Management, monitoring, and troubleshooting in dynamic environments,both cloud-based and on-premises products, is increasingly important asthe popularity of such products continues to grow. As the quantities oftime-sensitive data grow, conventional techniques are increasinglydeficient in the management of these applications. Conventionaltechniques, such as relational databases, have difficulty managing largequantities of data and have limited scalability. Moreover, as monitoringanalytics of these large quantities of data often have real-timerequirements, the deficiencies of reliance on relational databasesbecome more pronounced. For instance, data stored in conventionaldatabases utilizes fixed partitioning schemes, such that query responsetime can be significantly impacted by the manner in which data ispartitioned on disk.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and form a part ofthis specification, illustrate various embodiments and, together withthe Description of Embodiments, serve to explain principles discussedbelow. The drawings referred to in this brief description of thedrawings should not be understood as being drawn to scale unlessspecifically noted.

FIG. 1 is a block diagram illustrating a system for adapting time seriesdatabase schema, in accordance with embodiments.

FIG. 2 is a block diagram illustrating an example ingestion node foringesting data points of time series data and adapting a time seriesdatabase schema, in accordance with embodiments.

FIG. 3A is a block diagram illustrating an example query node foradapting a time series database schema, in accordance with embodiments.

FIG. 3B is a black diagram illustrating an example query planner of aquery node for adapting a time series database schema, in accordancewith embodiments.

FIG. 4 is a block diagram illustrating a system for maintaining a timeseries database including a plurality of time series database schemas,in accordance with embodiments.

FIG. 5 is a block diagram illustrating an example ingestion node foringesting data points of time series data according to a plurality oftime series database schemas, in accordance with embodiments.

FIG. 6A is a block diagram illustrating an example query planner of aquery node for querying variably partitioned time series database, inaccordance with embodiments.

FIG. 6B is a black diagram illustrating an example query plan executorof a query node for querying variably partitioned time series database,in accordance with embodiments.

FIG. 7 is a graph illustrating an example time series database schemavariability over time in a time series database including a single timeseries database schema for each time instance, in accordance withembodiments.

FIG. 8 is a graph illustrating an example time series database schemavariability over time in a time series database including multiple timeseries database schemas for each time instance, in accordance withembodiments.

FIG. 9 is a block diagram of an example computer system upon whichembodiments of the present invention can be implemented.

FIG. 10 depicts a flow diagram for adapting time series database schemabased on received data points, according to various embodiments.

FIG. 11 depicts a flow diagram for ingesting time series data into atime series database, according to various embodiments.

FIG. 12 depicts a flow diagram for adapting time series database schemabased on analysis of received queries, according to various embodiments.

FIGS. 13A and 13B depict flow diagrams for determining whether to adapta time series database schema, according to embodiments.

FIG. 14 depicts a flow diagram for maintaining a time series databaseincluding a plurality of time series database schemas, according tovarious embodiments.

FIG. 15 depicts a flow diagram for querying a variably partitioned atime series database, according to various embodiments.

FIG. 16 depicts a flow diagram for determining the time series databaseschema corresponding to a time range of a query in a time seriesdatabase including multiple time series database schemas for each timeinstance, according to various embodiments.

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

Reference will now be made in detail to various embodiments of thesubject matter, examples of which are illustrated in the accompanyingdrawings. While various embodiments are discussed herein, it will beunderstood that they are not intended to limit to these embodiments. Onthe contrary, the presented embodiments are intended to coveralternatives, modifications and equivalents, which may be includedwithin the spirit and scope the various embodiments as defined by theappended claims. Furthermore, in this Description of Embodiments,numerous specific details are set forth in order to provide a thoroughunderstanding of embodiments of the present subject matter. However,embodiments may be practiced without these specific details. In otherinstances, well known methods, procedures, components, and circuits havenot been described in detail as not to unnecessarily obscure aspects ofthe described embodiments. As denoted elsewhere herein, like elementnumbers are intended to indicate like elements or features.

Some portions of the detailed descriptions which follow are presented interms of procedures, logic blocks, processing and other symbolicrepresentations of operations on data bits within a computer memory.These descriptions and representations are the means used by thoseskilled in the data processing arts to most effectively convey thesubstance of their work to others skilled in the art. In the presentapplication, a procedure, logic block, process, or the like, isconceived to be one or more self-consistent procedures or instructionsleading to a desired result. The procedures are those requiring physicalmanipulations of physical quantities. Usually, although not necessarily,these quantities take the form of electrical or magnetic signals capableof being stored, transferred, combined, compared, and otherwisemanipulated in an electronic device.

It should be borne in mind, however, that all of these and similar termsare to be associated with the appropriate physical quantities and aremerely convenient labels applied to these quantities. Unlessspecifically stated otherwise as apparent from the followingdiscussions, it is appreciated that throughout the description ofembodiments, discussions utilizing terms such as “accessing,”“analyzing,” “determining,” “adapting,” “ingesting,” “identifying,”“adding,” “removing,” “ranking,” “receiving,” “dividing,” “executing,”“joining,” “selecting,” or the like, refer to the actions and processesof an electronic computing device or system such as: a host processor, aprocessor, a memory, a cloud-computing environment, a hyper-convergedappliance, a software defined network (SDN) manager, a system manager, avirtualization management server or a virtual machine (VM), amongothers, of a virtualization infrastructure or a computer system of adistributed computing system, or the like, or a combination thereof. Theelectronic device manipulates and transforms data represented asphysical (electronic and/or magnetic) quantities within the electronicdevice's registers and memories into other data similarly represented asphysical quantities within the electronic device's memories or registersor other such information storage, transmission, processing, or displaycomponents.

Embodiments described herein may be discussed in the general context ofprocessor-executable instructions residing on some form ofnon-transitory processor-readable medium, such as program modules,executed by one or more computers or other devices. Generally, programmodules include routines, programs, objects, components, datastructures, etc., that perform particular tasks or implement particularabstract data types. The functionality of the program modules may becombined or distributed as desired in various embodiments.

In the figures, a single block may be described as performing a functionor functions; however, in actual practice, the function or functionsperformed by that block may be performed in a single component or acrossmultiple components, and/or may be performed using hardware, usingsoftware, or using a combination of hardware and software. To clearlyillustrate this interchangeability of hardware and software, variousillustrative components, blocks, modules, circuits, and steps have beendescribed generally in terms of their functionality. Whether suchfunctionality is implemented as hardware or software depends upon theparticular application and design constraints imposed on the overallsystem. Skilled artisans may implement the described functionality invarying ways for each particular application, but such implementationdecisions should not be interpreted as causing a departure from thescope of the present disclosure. Also, the example mobile electronicdevice described herein may include components other than those shown,including well-known components.

The techniques described herein may be implemented in hardware,software, firmware, or any combination thereof, unless specificallydescribed as being implemented in a specific manner. Any featuresdescribed as modules or components may also be implemented together inan integrated logic device or separately as discrete but interoperablelogic devices. If implemented in software, the techniques may berealized at least in part by a non-transitory processor-readable storagemedium comprising instructions that, when executed, perform one or moreof the methods described herein. The non-transitory processor-readabledata storage medium may form part of a computer program product, whichmay include packaging materials.

The non-transitory processor-readable storage medium may comprise randomaccess memory (RAM) such as synchronous dynamic random access memory(SDRAM), read only memory (ROM), non-volatile random access memory(NVRAM), electrically erasable programmable read-only memory (EEPROM),FLASH memory, other known storage media, and the like. The techniquesadditionally, or alternatively, may be realized at least in part by aprocessor-readable communication medium that carries or communicatescode in the form of instructions or data structures and that can beaccessed, read, and/or executed by a computer or other processor.

The various illustrative logical blocks, modules, circuits andinstructions described in connection with the embodiments disclosedherein may be executed by one or more processors, such as one or moremotion processing units (MPUs), sensor processing units (SPUs), hostprocessor(s) or core(s) thereof, digital signal processors (DSPs),general purpose microprocessors, application specific integratedcircuits (ASICs), application specific instruction set processors(ASIPs), field programmable gate arrays (FPGAs), or other equivalentintegrated or discrete logic circuitry. The term “processor,” as usedherein may refer to any of the foregoing structures or any otherstructure suitable for implementation of the techniques describedherein. In addition, in some aspects, the functionality described hereinmay be provided within dedicated software modules or hardware modulesconfigured as described herein. Also, the techniques could be fullyimplemented in one or more circuits or logic elements. A general purposeprocessor may be a microprocessor, but in the alternative, the processormay be any conventional processor, controller, microcontroller, or statemachine. A processor may also be implemented as a combination ofcomputing devices, e.g., a combination of an SPU/MPU and amicroprocessor, a plurality of microprocessors, one or moremicroprocessors in conjunction with an SPU core, MPU core, or any othersuch configuration.

OVERVIEW OF DISCUSSION

Example embodiments described herein improve the performance of computersystems by providing adaptation of time series database schema for timeseries data stored in a time series database.

Time series data points are stored in one or more time series databases,where the time series databases may be comprised within one or morestorage devices. The time series database is partitioned into multipleshards according to a time series database schema, where a shard is ahorizontal partition of the time series database. Data points includemultiple searchable dimensions. In some embodiments, the data pointdimensions include a metric name, a host name (e.g., source), atimestamp, a metric value, and one or more point tags. The time seriesdatabase schema defines how the data points are stored, defining thedimensions used for identifying the specificity of the scan (e.g.,partition or shard) and defining the dimensions stored as metadata forpost-scan filtering in response to a query.

In conventional times series databases, the data points can be stored aslexicographically ordered keys that are positionally ordered such thatsome dimensions (e.g., metric name and host name) are on the left sideof the timestamp and some dimensions (e.g., metric value and the pointtags) are stored on the right side of the timestamp. The data points arestored in the time series database according to a fixed schema, wherebyshards are identified by the dimensions on the left side of the timestamp. It should be appreciated that other information can be includedin the data point (e.g., customer name), and that this information canalso be used for identifying a shard. In general, the data points arestored within a time series database such that information related tothe specificity of the scans (e.g., location on storage device) isstored separately from the information used in post-processing to filterpoint tags (e.g., metadata).

In performing a database query, a query is processed according toinformation on the left side dimensions (e.g., the metric and host) tolocate storage device location to scan. The information in the pointtags to be searched in a query are subject to post-filtering after thescan. Since not all data points might be needed, any data point notreturned as a result of a query is dropped. In a conventional timeseries database, a single database scan provides all summaries for asingle metric-host regardless of point tag filters, where yield is thepercentage of summaries that are actually needed (e.g., after filtering)during a single database scan.

A low yield is indicative of a database schema that is not designed forefficient querying of a particular dimension. For example, if allmetrics have the same metric name and host name, the database includesessentially one shard, potentially providing a low yield since allsearches would return summaries for the entire database, which wouldthen be filtered. In other examples, if metrics include a small numberof hosts, or if a single shard has many tag sets and the resultstypically only require a small subset of the tag sets, the queries mightresult in low yield.

Embodiments described herein provide an autonomous, self-tuning timeseries database based on the data shape of the ingested data points. Asdescribed herein, the time series database schema is adaptive based onan analysis of the ingestion data shape. By analyzing the ingestionstream, certain characteristics of the time series data can beidentified and exploited to improve performance (e.g., improve yield) byadapting the time series database schema. For instance, the data pointsare analyzed at ingestion to determine key statistics (e.g., hosts permetric, tags per host, partitionability of tag keys, etc.) and toprovide suggestions as to tag keys can be used to alter the shardingscheme (e.g., the most “selective” tags, “partitioning” tags, frequenttags, changing tags etc.) In some embodiments, the database schema isautomatically adapted based on the analysis. In some embodiments,configuration is provided to ensure cardinality falls within a certainrange. A partitioning strategy can be determined for each stream of dataingested (e.g., on a per entity basis).

Other embodiments described herein provide an autonomous, self-tuningtime series database based on the queries received over time. Asdescribed herein, the time series database schema is adaptive based onan analysis of the received queries. By analyzing the queries, thefrequency of appearance of dimensions as predicates within queries canbe determined. The frequency of appearance can be used to determinewhether a dimension is likely to appear within a query, and can be inindication that adapting the time series database schema can be adaptedto improve performance (e.g., improve yield). The query analysis, insome embodiments, can also be used to confirm whether to adapt the timeseries database schema based on the data shape analysis of ingesteddata. For example, even if the analysis of the ingestion data shapeindicates that a shard should be added for a particular dimension, ifthe query analysis determines that the particular dimension is rarely orinfrequently included as a predicate within a query, an adaptation ofthe time series database schema may not result in a performanceimprovement, and the adaptation may not be performed.

For instance, over time it can be determined that the sharding schemacan be changed to store the data points according to differentdimensions. For example, over time it can be determined that the bestway to store the data at time t₁ is to shard the data by a firstdimension, at time t₂ to shard the data by the first dimension and asecond dimension, and then at time t₃, back to the first dimension. Itshould be appreciated that the historical sharding schemas aremaintained so that during the ingestion of historical data the correctpartitioning scheme is followed.

When querying data over long time periods, all partition schemes need tobe considered. It should be appreciated that in accordance with variousembodiments, the time series database schema is continuously adapting toingestion load and requires no user interaction and that there is nore-indexing of data. Some embodiments select and utilize a partitioningscheme available for a single stream at a given time. Even as the datashapes change over time (e.g., every week), the available times seriesdatabase schema can be updated and utilized.

Embodiments described herein provide methods and systems for adaptingtime series database schema of a time series database based on ingesteddata. Time series data ingested into a time series database according toa time series database schema is accessed over a time period, whereintime series data comprises a plurality of dimensions. The time seriesdata of the time period is analyzed to determine a data shape of thetime series data of the time period. It is determined whether to adaptthe time series database schema based at least in part on the data shapeof the time series data of the time period. In some embodiments, thetime series database schema is adapted based at least in part on thedata shape of the time series data of the time period. Time series datais then ingested into the time series database according to the adaptedtime series database schema.

Other embodiments described herein provide methods and systems foradapting time series database schema of a time series database based onreceived queries. A plurality of queries to a time series databasereceived over a time period is accessed, wherein time series data isingested into the time series database according to a time seriesdatabase schema, wherein time series data comprises a plurality ofdimensions. The plurality of queries of the time period is analyzed todetermine a relative frequency of the plurality of dimensions within theplurality of queries over the time period. It is determined whether toadapt the time series database schema based at least in part on therelative frequency of the plurality of dimensions within the pluralityof queries over the time period. In some embodiments, the time seriesdatabase schema is adapted based at least in part on the queries of thetime period. Time series data is then ingested into the time seriesdatabase according to the adapted time series database schema.

Embodiments described herein provide a time series database includingmultiple time series database schemas. In some embodiments, the timeseries database schemas are variably partitioned, e.g., according to themethods for adapting time series database schema described above.Maintaining multiple time series database schemas within a time seriesdatabase allows for improved query handling by running a query against atime series database schema that is most tuned for the particular query,thus reducing processing time by directing the query to the appropriatetime series database schema based on an analysis of the query.

Embodiments described herein provided a computer-implemented method formaintaining a time series database including a plurality of time seriesdatabase schemas. Time series data including data points are received atan ingestion node of a time series database, the data points comprisinga plurality of dimensions. A plurality of time series database schemasof the time series database is determined for storing the time seriesdata. The time series data is ingested according to the plurality oftime series database schemas, wherein each data point is storedaccording to each time series database schema of the plurality of timeseries database schemas, such that the time series database comprisesmultiple instances of each data point.

Embodiments described herein provide methods for querying a variablypartitioned time series database. Running a query against variablypartitioned time-series data requires additional processing of thequery, effectively splitting the query into multiple sub-queries.Variably partitioned time series databases include multiple time seriesdatabase schemas that vary over time. The time ranges for whichparticular schemas are applicable is managed, such that queries arebifurcated into multiple sub-queries directed to the particular timeseries database schemas over the range of the query.

Embodiments described herein provide a computer-implemented method foradapting time series database schema of a time series database. Timeseries data ingested into a time series database according to a timeseries database schema is accessed over a time period, wherein timeseries data comprises a plurality of dimensions. The time series data ofthe time period is analyzed to determine a data shape of the time seriesdata of the time period. It is determined whether to adapt the timeseries database schema based at least in part on the data shape of thetime series data of the time period. In some embodiments, the timeseries database schema is adapted based at least in part on the datashape of the time series data of the time period. Time series data isthen ingested into the time series database according to the adaptedtime series database schema.

As presented above, time series data monitoring systems typicallyprocess very large amounts of data, and query response is highlydependent on the time series database schema used to ingest the timeseries data. The embodiments described herein greatly extend beyondconventional methods of storing time series data in a time seriesdatabase of a time series data monitoring system. For instance, adaptingthe time series database schema based on a data shape of the ingesteddata points or the received queries provides improved partitioning ofthe time series data. Upon query, fewer scans to the database areneeded, and the yield of the performed scans is increased, providingimproved results to queries, reducing the number of I/O operations,thereby improving processing and throughput. By adapting the time seriesdatabase schema when appropriate, the described embodiments reduce theimpact on processing and throughput of queries of time series data.

Accordingly, embodiments of the present invention amount tosignificantly more than merely using a computer to adapt a time seriesdatabase schema. Instead, embodiments of the present inventionspecifically recite a novel process, rooted in computer technology, fordetermining whether an adaptation to the time series database schema isbeneficial and, if such an adaptation is suggested, automaticallyperforming the adaptation to improve performance of the time series datamonitoring system, to overcome a problem specifically arising in therealm of monitoring time series data and querying on time series datawithin computer systems.

Example System for Adapting Time Series Database Schema

FIG. 1 is a block diagram illustrating an embodiment of a system 100 foradapting time series database schema of time series database 130,according to embodiments. System 100 is a distributed system includingmultiple ingestion nodes 102 a through 102 n (collectively referred toherein as ingestion nodes 102) and multiple query nodes 104 a through104 n (collectively referred to herein as query nodes 104). Time series110 is received at ingestion nodes 102 and stored within time seriesdatabase 130. Query nodes 104 receive at least one query 120 forquerying against time series database 130. Results 125 of query 120 arereturned upon execution of query 120.

It should be appreciated that system 100 can include any number ofingestion nodes 102 and multiple query nodes 104. Ingestion nodes 102and query nodes 104 can be distributed over a network of computingdevices in many different configurations. For example, the respectiveingestion nodes 102 and query nodes 104 can be implemented whereindividual nodes independently operate and perform separate ingestion orquery operations. In some embodiments, multiple nodes may operate on aparticular computing device (e.g., via virtualization), while performingindependently of other nodes on the computing device. In otherembodiment, many copies of the service (e.g., ingestion or query) aredistributed across multiple nodes (e.g., for purposes of reliability andscalability).

Time series data 110 is received at at least one ingestion node 102 athrough 102 n. In some embodiments, time series data includes anumerical measurement of a system or activity that can be collected andstored as a metric (also referred to as a “stream”). For example, onetype of metric is a CPU load measured over time. Other examples include,service uptime, memory usage, etc. It should be appreciated that metricscan be collected for any type of measurable performance of a system oractivity. Operations can be performed on data points in a stream. Insome instances, the operations can be performed in real time as datapoints are received. In other instances, the operations can be performedon historical data. Metrics analysis include a variety of use casesincluding online services (e.g., access to applications), softwaredevelopment, energy, Internet of Things (IoT), financial services (e.g.,payment processing), healthcare, manufacturing, retail, operationsmanagement, and the like. It should be appreciated that the precedingexamples are non-limiting, and that metrics analysis can be utilized inmany different types of use cases and applications.

In accordance with some embodiments, a data point in a stream (e.g., ina metric) includes a name, a source, a value, and a time stamp.Optionally, a data point can include one or more tags (e.g., pointtags). For example, a data point for a metric may include:

-   -   A name—the name of the metric (e.g., CPU_idle, service.uptime)    -   A source—the name of an application, host, container, instance,        or other entity generating the metric (e.g., web_server_1, app1,        app2)    -   A value—the value of the metric (e.g., 99% idle, 1000, 2000)    -   A timestamp—the timestamp of the metric (e.g., 1418436586000)    -   One or more point tags (optional)—custom metadata associated        with the metric (e.g., location=las_vegas, environment=prod)

Ingestion nodes 102 are configured to process received data points oftime series data 110 for persistence and indexing. In some embodiments,ingestion nodes 102 forward the data points of time series data 110 totime series database 130 for storage. In some embodiments, the datapoints of time series data 110 are transmitted to an intermediate bufferfor handling the storage of the data points at time series database 130.In one embodiment, time series database 130 can store and output timeseries data, e.g., TS1, TS2, TS3, etc. The data can include times seriesdata, which may be discrete or continuous. For example, the data caninclude live data fed to a discrete stream, e.g., for a standing query.Continuous sources can include analog output representing a value as afunction of time. With respect to processing operations, continuous datamay be time sensitive, e.g., reacting to a declared time at which a unitof stream processing is attempted, or a constant, e.g., a 10V signal.Discrete streams can be provided to the processing operations intimestamp order. It should be appreciated that the time series data maybe queried in real-time (e.g., by accessing the live data stream) oroffline processing (e.g., by accessing the stored time series data).

In some embodiments, ingestion nodes 102 are also configured to analyzethe data points of time series data 110 for determining whether toupdate the time series database schema of time series database 130. Timeseries data 110 ingested into a time series database according to a timeseries database schema is accessed over a time period. Time series data110 of the time period is analyzed to determine a data shape of timeseries data 110 of the time period. Ingestion nodes 102 determinewhether to adapt the time series database schema based at least in parton the data shape of time series data 110 of the time period. In someembodiments, the time series database schema is adapted based at leastin part on the data shape of time series data 110 of the time period.Time series data 110 is then ingested into the time series database 130according to the adapted time series database schema.

In some embodiments, ingestion nodes 102 and/or query nodes 104 areconfigured to analyze queries 120 for determining whether to update thetime series database schema of time series database 130. Queries 120received at query nodes 104 are accessed over a time period. Queries 120of the time period are analyzed to determine a relative frequency ofdimensions within queries 120 over the time period. Ingestion nodes 102and/or query nodes 104 determine whether to adapt the time seriesdatabase schema based at least in part on the relative frequency ofdimensions within queries 120 over the time period. In some embodiments,the time series database schema is adapted based at least in part on therelative frequency of dimensions within queries 120 of the time period.Time series data 110 is then ingested into the time series database 130according to the adapted time series database schema.

FIG. 2 is a block diagram illustrating an embodiment of an exampleingestion node 202 (e.g., one of ingestion nodes 102 a through 102 n ofFIG. 1 ) for ingesting data points 210 of time series data (e.g., timeseries data 110) for storage in time series database 130 according to atleast one time series database schema. In one embodiment, ingestion node202 receives data points 210, determines whether to update a time seriesdatabase schema, and ingests data into time series database 130according to the updated schema. In some embodiments, the dimensionalityof each time-series metric is collected to perform the analysis fordetermining whether to adapt the time series database schema. Ingestionnode 202 includes data point receiver 212, time series data analyzer220, schema update determiner 230, schema updater 240, and data pointstorage forwarder 250. It should be appreciated that ingestion node 202is one node of a plurality of ingestion nodes of a distributed systemfor managing time series data (e.g., system 100).

In the example shown in FIG. 2 , data points 210 are received. In oneembodiment, time series data including data points 210 is received froman application or system. Data points 210 are received at data pointreceiver 212. Data point receiver 212 is configured to forward datapoints 210 to time series data analyzer 220 and data point storageforwarder 250.

Data point storage forwarder 250 is configured to store data points 210in time series database 130 according to the time series databaseschema. As the time series database schema can adapt over time, atingestion it is determined which time series database schema to use tostore each particular datapoint, since the described embodiments do notrequire the re-indexing of the data. For instance, the proper partitionscheme can be identified by identifying the timestamp of each data pointto determine the partition scheme that was in use during the generationof a particular data point. It should be appreciated that differentmetrics in a single batch and data points from different time periodsfor a single metric can be ingested with different schemas.

Time series data analyzer 220 receives data points 210 and analyzes datapoints 210 received over a particular time period. For example, timeseries data analyzer 220 can analyze data points 210 received (e.g.,having timestamps) over a prior 24 hour time period, a seven day timeperiod, or any other time period. In some embodiments, time series dataanalyzer 220 analyzes a data shape of data points 210 over the timeperiod. The term “data shape” as used herein refers to the contributionof different dimensions of data points 210 (e.g., how many dimensionsper metric, how many hosts per metric, how many distinct points tags)and the frequency at which different dimensions are included in the datapoints 210.

In one embodiment, time series data analyzer 220 provides statisticsused to make a determination as to whether to update the schema used forstoring data points 210 in time series database 130. Examples ofstatistics generated includes how many tags per host, how many hosts permetric, the partitionability of tag keys, etc. The analysis performed bytime series data analyzer 220 is used by schema update determiner 230 indetermining whether to change the schema.

Time series data analyzer 220 is configured to analyze data points 210over a particular time period to determine whether the time seriesdatabase schema, also referred to as “sharding parameters,” should bechanged. A shard of time series database 130 determines the selectivityof the data stored therein. The determination as to whether to changethe sharding parameters can be based on the projected selectivity andyield of time-series queries. For example, if it is projected that aparticular dimension would improve the yield of queries, that dimensioncan be considered as a sharding parameter. In one embodiment, a scoringof the dimensions of data points 210 is performed, where the scoring isbased on an analysis of the frequency of the dimensions within the datapoints 210. In some embodiments, the scoring is subjected to a target,such that a score that satisfies the target or best satisfies the targetis indicative of a dimension that is a candidate sharding parameter.

For example, a scoring operation includes a target range of time seriesper partition (e.g., 10-50). The current partition scheme of the timeseries database schema is compared to the target and whether the currentpartition scheme satisfies the target. An analysis of the dimensions ofdata points 210 over the time series can be performed, to determinewhether dimensions not used as partition parameters would improve orsatisfy the target and/or to determine whether removing dimensionscurrently used as partition parameters would improve or satisfy thetarget.

In another example, time series data analyzer 220 determines thedivisibility of each dimension over the streams of data points 210. Inanother embodiment, time series data analyzer 220 determines thecardinality of each dimension over the streams of data points 210. Thedivisibility and cardinality can be used by schema update determiner 230to determine whether to adapt the schema. In one embodiment, time seriesdata analyzer 220 performs a cardinality analysis. The cardinalityanalysis can take into consideration a number of factors, such as theminimum and maximum number of times a time series reports within a timeperiod (e.g., 24 hours) and a tag partition power. Only point tags thatcan partition the incoming streams by the partition tag power areconsidered as sharding parameters.

The information generated by time series data analyzer 220 can be usedto build a partition scheme on a per-metric basis that can adapt overtime. For instance, over time it can be determined that the best way tostore the data at time t₁ is to shard the data by a first dimension, attime t₂ to shard the data by the first dimension and a second dimension,and then at time t₃, back to the first dimension. It should beappreciated that the historical sharding schemas are maintained so thatduring the ingestion of historical data the correct partitioning schemeis followed.

Schema update determiner 230 receives analytical data from time seriesdata analyzer 220 and is configured to make a determination regardingwhether and how to change the schema for use in time series database130. Using the analytical data, schema update determiner 230 determineswhether an update to the time series database schema would improve thequery performance of the time series database 130 by adapting thepartition parameters. In one embodiment, changing the schema includesdetermining whether a dimension should be escalated as a partitionparameter or deescalated to no longer be a partition parameter.

Schema update determiner 230 is configured to determine whether a changeto the shading parameters of time series data 210 would improve queryperformance. When performing a query, in general, it is desirable toperform fewer scans against the back end (e.g., time series database130). By making dimensions that are statistically indicative of beingwell-partitioned to a shading parameter, reads to the back end may bereduced. It should be appreciated that such a determination is based onanalysis of previously received data points 210, and that changes to thedimensional makeup of future data points may render changes to theschema less effective.

In one embodiment, schema update determiner 230 receives scoringinformation from time series data analyzer. The scoring information mayinclude a ranking of the dimensions relative to the targets defined bythe scoring operation. Schema update determiner 230, using the scoringinformation, makes a determination as to whether a dimension should beadded as a partition parameter or removed as a partition parameter. Insome embodiments, schema update determiner selects the top one or twodimensions from the scoring information as partition parameters. In someembodiments, schema update determiner 230 removes the bottom one or twodimensions from the scoring information as partition parameters. Itshould be appreciated that some dimensions (e.g., metric or host) maynot be removed as partition parameters.

In some embodiments, schema update determiner 230 automatically adaptsthe time series database schema according to the satisfaction of adesired partitionability of the time series data 110. In someembodiments, schema update determiner 230 determines that the timeseries database schema should be updated to improve performance, andgenerates a notification 232 for a user to confirm or effectuate theschema update. User input 235 is received to effectuate or ignore thesuggested schema update.

Schema updater 240, in response to a determination from schema updatedeterminer 230 that the time series database schema is to be updated,effectuates an update of the time series database schema. Schema updater240 directs data point storage forwarder 250 to store data in timeseries database 130 according to the adapted schema by updating the datapoint processing performed at data point storage forwarder 250.

In one embodiment, schema updater 240 notifies other ingestion nodes 202by transmitting schema update 245 to the other ingestion nodes (e.g.,ingestion nodes 102 a through 102 n). In one embodiment, schema updater240 includes a multicaster for performing the multicasting schema update245 to a plurality of ingestion nodes.

FIG. 3A is a block diagram illustrating an embodiment of example querynode 104 (e.g., one of query nodes 104 a through 104 n of FIG. 1 ) foradapting a time series database schema, according to embodiments. In oneembodiment, query node 104 generates a query plan for the time seriesdata based on the query 310. Query node 104 includes a parser 304, aplanner 306, and an executor 308. Query node 104 can be implemented by aquery execution engine configured to parse a query at parser 304,produce a query execution plan at planner 306, fetch time series dataand run the time series data through processing operations, anddetermine an answer or response to the query at executor 308.

In the example shown in FIG. 3A, a query 310 is received. In oneembodiment, the query 310 is provided by a user via a client. Timeseries data is provided by time series database 130. The data caninclude times series data, which may be discrete or continuous. Query310 is received for searching the time series data. A query can includeelements that define searchable parameters of the time series data. Forexample, the query can include elements defining terms related tometrics, sources, values, timestamps, and/or point tags for isolatingand returning relevant results. The parser 304 receives a query 310 andparses the query for a predicate (e.g., elements and operators). Thepredicate forms at least part of a basis for generating a query plan.For instance, consider the example query:

ts(“*graf*”, host=“*2*” and tag=app and (status=production or role=app)and cluster=mon and cpu=cpu-total)

The example query is parsed into the predicate including the elementsand operators:

• metric = “*graf*” AND • host = “*2*” AND • (status=production ORrole=app) AND • cluster=mon AND • cpu=cpu-total

The planner 306 receives the parsed elements and operators of query 310and generates a query plan for retrieval of relevant time series datathat resolves the query 310. The planner 306 determines operations to beperformed on the relevant time series data to retrieve a result of thequery 310.

In operation, planner 306 receives a query. Planner 306 generates aquery plan for determining what to retrieve from time series databases130 based on the query. For example, planner 306 determines how manyscans to make on the time series database(s). The planner 306 then handsoff commands (e.g., a query plan) to executor 308 to perform anexecution phase, e.g., beginning execution of the query 310. Theexecutor 308 then outputs an answer to the query 316. Although shown asa single stream, the answer to the query 316 can include one or morestreams.

FIG. 3B is a black diagram illustrating an example query planner 306 foradapting a time series database schema, in accordance with embodiments.In the example shown in FIG. 3B, query 310 is received, where query 310is received for searching the time series data. In one embodiment, query310 is a parsed query received from parser 304. Query 310 is received atquery receiver 352. Query receiver 352 is configured to forward query310 to query analyzer 360.

Query analyzer 360 receives query 310 and analyzes multiple queries 310received over a particular time period. For example, query analyzer 360can analyze queries 310 received (e.g., having timestamps) over a prior24 hour time period, a seven day time period, or any other time period.In some embodiments, query analyzer 360 analyzes the queries 310 of thetime period to determine a relative frequency of the plurality ofdimensions within the queries 310 over the time period. The relativefrequency of the dimensions of queries 310 is used to determine whetheradapting the time series database schema is projected to improve theperformance (e.g., yield) of future queries 310.

In one embodiment, query analyzer 360 analyzes the queries 310 over thetime period for use in making a determination as to whether to updatethe schema used for storing data points 210 in time series database 130.For example, query analyzer may collect statistics on the receivedqueries 310, keeping track of the dimensions that are queried on. Thestatistics may include a count of each instance of a dimension beingincluded in a query 310. By understanding the frequency of dimensionswithin queries 310, it can be determined which dimensions are queried onmore frequently, and the time series database schema can be adapted toprovide more efficient querying on the frequently queried terms.Similarly, if a dimension is never queried, regardless of itspartitionability, it can be discarded as a partition parameter, as notbe relevant to improving the performance of queries. The analysisperformed by query analyzer 360 is used by schema update determiner 370in determining whether to change the schema.

Query analyzer 360 is configured to analyze queries 310 over aparticular time period to determine whether the sharding parametersshould be changed. A shard of time series database 130 determines theselectivity of the data stored therein. The determination as to whetherto change the sharding parameters can be based on the projectedselectivity and yield of time-series queries. For example, if it isprojected that a particular dimension would improve the yield ofqueries, that dimension can be considered as a sharding parameter.

The information generated by query analyzer 360 can be used to build apartition scheme on a per-metric basis that can adapt over time. Forinstance, over time it can be determined that the best way to store thedata at time t₁ is to shard the data by a first dimension, at time t₂ toshard the data by the first dimension and a second dimension, and thenat time t₃, back to the first dimension. It should be appreciated thatthe historical sharding schemas are maintained so that during theingestion of historical data the correct partitioning scheme isfollowed.

By analyzing queries 310, for instance, it can be determined that asingle time-series stream is predominantly “selected” for querying withparticular predicate dimensions. For example, a cluster of time seriesmight all be named “cpu.total” and tagged with “tenant=<some_tenant>”.An analysis of the queries 310 indicates that the queries 310 are alwaysor primarily queried with a predicate on “tenant”. As such, it is knownthat the dimension “tenant” of the time series data is the mostselective dimension. In such an example, the time series database schemacan be adapted to inject a shard for the dimension “tenant” to enhancethe ability to select the proper data in response to a query. It shouldbe appreciated that the selected tenant may not yield the mostpartitions, but that since it is a frequent query predicate, shardingaccording to this predicate is likely to provide benefits to queryprocessing.

Schema update determiner 370 receives analytical data from queryanalyzer 360 and is configured to make a determination regarding whetherand how to change the schema for use in time series database 130. Usingthe analytical data, schema update determiner 370 determines whether anupdate to the time series database schema would improve the queryperformance of the time series database 130 by adapting the partitionparameters. In one embodiment, changing the schema includes determiningwhether a dimension should be escalated as a partition parameter ordeescalated to no longer be a partition parameter.

Schema update determiner 370 is configured to determine whether a changeto the shading parameters of time series data 210 would improve queryperformance. When performing a query, in general, it is desirable toperform fewer scans against the back end (e.g., time series database130). By making dimensions that are statistically indicative of beingwell-partitioned to a shading parameter, reads to the back end may bereduced. It should be appreciated that such a determination is based onanalysis of previously received queries 310, and that changes to thedimensional makeup of future data points may render changes to theschema less effective.

In one embodiment, schema update determiner 370 receives statistics fromquery analyzer 360. The statistics may include a count of the number oftimes each dimension has appeared as a predicate in a query, or arelative frequency of appearance of each dimension as a predicate in aquery. The statistics may include a ranking of the dimensions accordingto the counts. Schema update determiner 370, using the count orfrequency information, makes a determination as to whether a dimensionshould be added as a partition parameter or removed as a partitionparameter. In some embodiments, schema update determiner selects the topone or two dimensions that appear as predicates in queries. In someembodiments, schema update determiner 370 removes the one or twodimensions having the fewest counts or lowest frequency of appearance asa predicate in queries as partition parameters. It should be appreciatedthat some dimensions (e.g., metric or host) may not be removed aspartition parameters.

In some embodiments, schema update determiner 370 automatically adaptsthe time series database schema according to the satisfaction of adesired partitionability of the time series data 110. In someembodiments, schema update determiner 370 determines that the timeseries database schema should be updated to improve performance, andgenerates a notification 372 for a user to confirm or effectuate theschema update. User input 375 is received to effectuate or ignore thesuggested schema update.

Schema updater 380, in response to a determination from schema updatedeterminer 370 that the time series database schema is to be updated,effectuates an update of the time series database schema. Schema updater380 directs the ingestion nodes (e.g., ingestion nodes 102 a through 102n of FIG. 1 ) to store data in time series database 130 according to theadapted schema by updating the data point processing (e.g., at datapoint storage forwarder 250 of FIG. 2 ). In one embodiment, schemaupdater 380 notifies the ingestion nodes 102 by transmitting schemaupdate 385 to the ingestion nodes (e.g., ingestion nodes 102 a through102 n). In one embodiment, schema updater 380 includes a multicaster forperforming the multicasting schema update 385 to a plurality ofingestion nodes.

In some embodiments, query analyzer 360 and schema update determiner 370can be used in combination with schema update determiner 230 ofingestion node 102. For instance, schema update determiner 230 receivesanalytics on the data shape of ingested data, and a particular dimensionis being considered for inclusion as a sharding parameter. Schema updatedeterminer 230 can collaborate with schema update determiner 370 todetermine whether the time series database schema should be updated toinclude this particular dimension based on the appearance of thedimension as a predicate in queries. For example, the data shapeanalysis suggests that the particular dimension is partitionable andshould be considered as a candidate sharding parameter. However, if theparticular dimension does not appear in queries as a predicate, or has alow relative frequency of appearance, adapting the time series databaseschema to include this particular dimension as a sharding parameterwould not improve query response performance. As such, this particularquery can be removed from consideration (at this time) as a shardingparameter.

It should be appreciated that in accordance with some embodiments, queryreceiver 352, query analyzer 360, schema update determiner 370, andschema updater 380 may alternatively be implemented within an ingestionnode 102. In such embodiments, queries 120 received at query nodes 104are accessed by ingestion nodes 102 subsequent receipt at query nodes104.

Hence, the embodiments of the present invention greatly extend beyondconventional methods of storing time series data in a time seriesdatabase of a time series data monitoring system. For instance, adaptingthe time series database schema based on a data shape of the ingesteddata points or the queries provides improved partitioning the timeseries data. Upon query, fewer scans to the database are needed, and theyield of the performed scans is increased, providing improved results toqueries, reducing the number of I/O operations, thereby improvingprocessing and throughput. By adapting the time series database schemawhen appropriate, the described embodiments reduce the impact onprocessing and throughput of queries of time series data.

Accordingly, embodiments of the present invention amount tosignificantly more than merely using a computer to adapt a time seriesdatabase schema. Instead, embodiments of the present inventionspecifically recite a novel process, rooted in computer technology, fordetermining whether an adaptation to the time series database schema isbeneficial and, if such an adaptation is suggested, automaticallyperforming the adaptation to improve performance of the time series datamonitoring system, to overcome a problem specifically arising in therealm of monitoring time series data and querying on time series datawithin computer systems.

Example System for Maintaining a Time Series Database Including MultipleTime Series Database Schemas

Embodiments described herein provide a time series database includingmultiple time series database schemas. In some embodiments, the timeseries database schemas are variably partitioned, e.g., according to themethods for adapting time series database schema described above.Maintaining multiple time series database schemas within a time seriesdatabase allows for improved query handling by running a query against atime series database schema that is most tuned for the particular query,thus reducing processing time by directing the query to the appropriatetime series database schema based on an analysis of the query.

In some embodiments, data points can be stored in multiple instanceswith different time series database schemas, e.g., when theingestion/query patterns of particular data demands it. For example,each data point is stored N times rather than once, where N is thenumber of time series database schemas available. During queryexecution, rather than of having only one possible time series databaseschema at any given time, there are multiple time series databaseschemas to consider for different time ranges and the system chooses onetime series database schema to query against. In one embodiment, thesystem chooses the time series database schema that is most selective(e.g., includes all the data that is the object of the query and theleast amount of data to be filtered out). While the query requires allthe data that is the object of the query, reducing the amount of data tobe filtered is an optimization related to the yield of a scan.

FIG. 4 is a block diagram illustrating an embodiment of a system 400 formaintaining time series database 130 including a plurality of timeseries database schemas 410 a through 410 n. System 400 operates insubstantially the same manner as system 100, and can include the samecomponents, where like element numbers are intended to indicate likeelements or features.

Time series data 110 is received at ingestion nodes 102, where ingestionnodes 102 are configured to process received data points of time seriesdata 110 for persistence and indexing. In some embodiments, ingestionnodes 102 forward the data points of time series data 110 to time seriesdatabase 130 for storage. In some embodiments, the data points of timeseries data 110 are transmitted to an intermediate buffer for handlingthe storage of the data points at time series database 130. In oneembodiment, time series database 130 can store and output time seriesdata, e.g., TS1, TS2, TS3, etc. The data can include times series data,which may be discrete or continuous.

Time series database 130 includes data stored according to multiple timeseries database schemas, illustrated as time series database schemas 410a through 410 n. It should be appreciated that time series database 130can include time series data stored according to any number of timeseries database schemas, and is not intended to be limited to theillustrated embodiment. Moreover, it should be appreciated that thenumber of time series database schemas can vary over time (e.g., twotime series database schemas from t₀ through t₁, three time seriesdatabase schemas from t₁ through t₂, two times series database schemasfrom t₂ through t₃, etc.)

Ingestion nodes 102 receive time series data 110 including data points,where the data points include a plurality of dimensions. Ingestion nodes102 determine the time series database schemas that are utilized uponreceipt of times series data 110. As described above, it should beappreciated that the time series database schemas can adapt over time.In some embodiments, a time stamp a data point is accessed, and the timeseries database schema(s) applicable for the time stamp is identified.The time series data is ingested according to the plurality of timeseries database schemas, wherein each data point is stored in timeseries database 130 according to each time series database schema of theplurality of time series database schemas. As such, time series database130 includes multiple instances of each data point, one for each timeseries database schema in use at the time of ingestion or according tothe time stamp of the data point.

FIG. 5 is a block diagram illustrating an embodiment of an exampleingestion node 502 (e.g., one of ingestion nodes 102 a through 102 n ofFIG. 1 ) for ingesting data points 210 of time series data (e.g., timeseries data 110) for storage in time series database 130 according to aplurality of time series database schemas. It should be appreciated thatingestion node 502 can also include the components of ingestion node202, and vice versa, and that the described operations of ingestion node502 and ingestion node 202 are separated herein so as to not obfuscatethe described embodiments.

In one embodiment, ingestion node 502 receives data points 505, andingests data into time series database 130 according to the plurality oftime series database schemas 410 a through 410 n. In some embodiments,the dimensionality of each time-series metric is collected to performthe analysis for determining whether to adapt the time series databaseschema. Ingestion node 502 includes data point receiver 510, data pointingestor 520, schema determiner 530, and data point storage forwarder540. It should be appreciated that ingestion node 502 is one node of aplurality of ingestion nodes of a distributed system for managing timeseries data (e.g., system 100).

In the example shown in FIG. 5 , data points 505 are received. In oneembodiment, time series data including data points 505 is received froman application or system. Data points 505 are received at data pointreceiver 510. Data point receiver 510 is configured to forward datapoints 505 to data point ingestor 520.

Data point ingestor 520 is configured to format or structure data points505 according to the plurality of time series database schemas 410 athrough 410 n that are applicable at time series database 130. In oneembodiment, data point ingestor 520 receives the applicable time seriesdatabase schemas 410 a through 410 n from schema determiner 530. Schemadeterminer 530 may include information identifying the time seriesdatabase schemas 410 a through 410 n applicable for particular timeperiods. For example, schema determiner may receive a schema update 245from a schema update 240 of ingestion node 502 or another ingestion node102.

Data point storage forwarder 540 is configured to store data points 505in time series database 130 according to the multiple time seriesdatabase schemas 410 a through 410 n as indicated at data point ingestor520. As the time series database schema(s) can adapt over time, atingestion it is determined which time series database schema(s) to useto store each particular data point, since the described embodiments donot require the re-indexing of the data. For instance, the properpartition scheme can be identified by identifying the timestamp of eachdata point to determine the partition scheme that was in use during thegeneration of a particular data point. It should be appreciated thatdifferent metrics in a single batch and data points from different timeperiods for a single metric can be ingested with different schemas.

Hence, the embodiments of the present invention greatly extend beyondconventional methods of storing time series data in a time seriesdatabase of a time series data monitoring system. For instance,ingesting and storing time series data according to multiple time seriesdatabase schemes provides improved partitioning of the time series datafor improving performance of querying of the data by allowing selectionof a time series database schema that provides improved performance. Forinstance, upon query, fewer scans to the database are needed, and theyield of the performed scans is increased, providing improved results toqueries, reducing the number of I/O operations, thereby improvingprocessing and throughput. By providing the storage of time series dataaccording to multiple time series database schema, the describedembodiments reduce the impact on processing and throughput of queries oftime series data.

Accordingly, embodiments of the present invention amount tosignificantly more than merely using a computer to store time seriesdata in a time series database of a time series data monitoring system.Instead, embodiments of the present invention specifically recite anovel process, rooted in computer technology, for storing time seriesdata according to multiple time series database schemas, improvingperformance of query processing in a time series data monitoring system.

Example System for Querying a Variably Partitioned Time Series Database

Embodiments described herein provide methods for querying a variablypartitioned time series database. Running a query against variablypartitioned time-series data requires additional processing of thequery, effectively splitting the query into multiple sub-queries.Variably partitioned time series databases include multiple time seriesdatabase schemas that vary over time. The time ranges for whichparticular schemas are applicable is managed, such that queries arebifurcated into multiple sub-queries directed to the particular timeseries database schemas over the range of the query.

During querying time, transparent to a user, the system determines thedimensions to be scanned. In some embodiments, the time series databaseschema for each dimension is accessed and all available time seriesdatabase schemas are loaded. On a per-dimension basis, the availabletime series database schemas are deduplicated and scan strategies forthe time series database schemas are produced. In some embodiments,multiple time series database schemas can be considered over a singletime window during querying and the system can switch between differentscan strategies. For instance, from t₁ to t₂, if it is determined thatthe best scan strategy is X, scan strategy X is used. Then, from t₂ tot₃, the best strategy could be Y, and then again back to scan strategy Xfrom t₃ to t₄. Smaller (by time) scans can be issued for the entire timerange from t₁ to t₄ which, together with deduplication, give the bestpossible query execution plan. Upon completion of the scan, the datastreams are joined in time-ordered fashion so that the multiple variablypartitioned streams of data are presented as a single continuous orderedstream of data points.

FIG. 6A is a block diagram illustrating an embodiment of example queryplanner 606 (e.g., planner 306 of FIG. 3A) of query node 104 (e.g., oneof query nodes 104 a through 104 n of FIG. 1 ) for querying a variablypartitioned time series database, according to embodiments. It should beappreciated that query node 606 can also include the components of queryplanner 306, and vice versa, and that the described operations of querynode 606 and query planner 306 are separated herein so as to notobfuscate the described embodiments.

In one embodiment, query planner 606 generates a query plan for the timeseries data based on the query 610. In the example shown in FIG. 6A,query 610 is received, where query 610 is received for searching thetime series data of variably partitioned time series database 130. Query610 includes a time range over which query 610 is to be run and apredicate comprising at least one dimension. Time series database 130 is“variably partitioned” in that time series database 130 comprises datapoints stored according to multiple time series database schemas. Insome embodiments, time series database 130 includes a single time seriesdatabase schema for each time instance. In other embodiments, timeseries database 130 includes multiple time series database schemas forat least one time instance.

In one embodiment, query 610 is a parsed query received from parser 304.Query 610 is received at query receiver 620. Query receiver 620 isconfigured to forward query 610 to schema determiner 630.

Schema determiner 630 receives query 610 and determines at least onetime series database schema corresponding to the time range. Asdescribed above, time series database 130 includes data stored accordingto multiple time series database schemas such that, depending on thetime range, different time series database schemas may need to bescanned. Schema determiner 630 forwards the determination of the timeseries database schemas applicable over the time range of query 610 toquery divider 640, along with query 610.

Query divider 640 divides query 610 into a plurality of sub-queries 645,wherein each sub-query 645 corresponds to one time series databaseschema of the plurality of time series database schemas used by timeseries database 130. It should be appreciated that sub-queries 645include temporally adjacent portions of the time range of query 610.Query divider 640 forwards the sub-queries 645 to plan generator 650.Plan generator 650 generates a query plan 660 for determining what toretrieve from time series database 130 based on the sub-queries 645.

FIG. 6B is a black diagram illustrating an example plan executor 608(e.g., plan executor 308 of FIG. 3A) of a query node 104 (e.g., one ofquery nodes 104 a through 104 n of FIG. 1 ) for querying variablypartitioned time series database, in accordance with embodiments. Queryplan 660 is received at query plan executor 670, in which query plan 660includes multiple sub-queries. Query plan executor 670 executes themultiple sub-queries, generating sub-query results 675. Sub-queryresults 675 are forwarded to sub-query result joiner 680 for joining themultiple sub-query results 675 into query results 685. In this fashion,the division of query 610 into multiple sub-queries 645, and theprocessing of multiple sub-queries 645, is transparent to the user bygenerating a combined query results 685 based on query 610.

FIG. 7 is a graph 700 illustrating an example time series databaseschema variability over time in a time series database including asingle time series database schema for each time instance, in accordancewith embodiments. As illustrated in graph 700, the time series databaseschema for variable schema 702 varies over time (illustrated as t),e.g., due to the adaptation described above. For example, from t₀through t₁, variable schema 702 uses time series database schema 710,from t₁ through t₂, variable schema 702 uses time series database schema720, from t₂ through t₃, variable schema 702 uses time series databaseschema 710, and from t₃ through t₄, variable schema 702 uses time seriesdatabase schema 730.

In the example of FIG. 7 , a query is received for querying variableschema 702 over query time range 750. During querying of variable schema702, the query node needs to determine the time series database schemasused over query time range 750. As illustrated, query time range 750spans t_(a) through t_(b), where t_(a) is between to and and t_(b) isbetween t₃ and t₄.

Accordingly, continuing with the example of FIG. 7 , schema determiner630 determines that variable schema 702 uses time series database schema710 from t_(a) through t₁, variable schema 702 uses time series databaseschema 720 from t₁ through t₂, variable schema 702 uses time seriesdatabase schema 710 and from t₂ through t₃, and variable schema 702 usestime series database schema 730 from t₃ through t_(b). It should beappreciated that the portion of the time range spanning t_(a) through t₁is temporally adjacent to the portion of query time range 750 spanningt₁ through t₂, the portion of the time range spanning t₁ through t₂ istemporally adjacent to the portion of query time range 750 spanning t₂through t₃, and the portion of the time range spanning t₂ through t₃ istemporally adjacent to the portion of query time range 750 spanning t₃through t_(b).

With reference again to FIG. 6A, schema determiner 630 forwards thedetermination of the time series database schemas applicable over thetime range of the query to query divider 640, along with the query.

Query divider 640 divides the query into four sub-queries, the firstsub-query spanning t_(a) through t₁ and corresponding to time seriesdatabase schema 710, the second sub-query spanning t₁ through t₂ andcorresponding to time series database schema 720, the third sub-queryspanning t₂ through t₃ and corresponding to time series database schema710, and the fourth sub-query spanning t₃ through t_(b) andcorresponding to time series database schema 730.

Query divider 640 forwards the sub-queries to plan generator 650. Plangenerator 650 generates a query plan 660 for determining the data toretrieve from time series database 130 based on the sub-queries. Queryplan executor 670 receives query plan 660 and executes the foursub-queries defined in query plan 660. The four sub-query results 675are joined at sub-query result joiner 680, generating query results 685that includes the four sub-query results 675.

FIG. 8 is a graph 800 illustrating an example time series databaseschema variability over time in a time series database includingmultiple time series database schemas for each time instance, inaccordance with embodiments. As illustrated in graph 800, the timeseries database schema for time series database schemas 802 and 804 varyover time (illustrated as t), e.g., due to the adaptation describedabove. For example, from to through t₂, variable schema 802 uses timeseries database schema 840, from t₂ through t₄, variable schema 802 usestime series database schema 850, from t₄ through t₅, variable schema 802uses time series database schema 860, and from t₅ through t₇, variableschema 802 uses time series database schema 870. Similarly, from tothrough t₁, variable schema 804 uses time series database schema 810,from t₁ through t₃, variable schema 804 uses time series database schema820, from t₃ through t₆, variable schema 804 uses time series databaseschema 810, and from t₆ through t₇, variable schema 804 uses time seriesdatabase schema 830. While FIG. 8 illustrates an example including twovariable schemas, it should be appreciated that embodiments describedherein are applicable to any number of variable schemas within a timeseries database.

In the example of FIG. 8 , a query is received for querying variableschema 802 and variable schema 804 over query time range 815. Duringquerying of variable schema 802 and variable schema 804, the query nodeneeds to determine the time series database schemas used over query timerange 815. As illustrated, query time range 815 spans t_(a) throught_(b), where t_(a) is between t₁ and t₂, and t_(b) is between t₅ and t₆.

Accordingly, continuing with the example of FIG. 8 , schema determiner630 determines that variable schema 802 uses time series database schema840 from t_(a) through t₂, variable schema 802 uses time series databaseschema 850 from t₂ through t₄, variable schema 802 uses time seriesdatabase schema 860 and from t₄ through t₅, and variable schema 802 usestime series database schema 870 from t₅ through t_(b). Similarly, schemadeterminer 630 determines that variable schema 804 uses time seriesdatabase schema 820 from t_(a) through t₃, and variable schema 804 usestime series database schema 810 from t₃ through t_(b).

It should be appreciated that the portion of the query time range 815spanning t_(a) through t₂, the portion of the time range spanning t_(a)through t₂ is temporally adjacent to the portion of query time range 815spanning t₂ through t₃, the portion of the time range spanning t₂through t₃ is temporally adjacent to the portion of query time range 815spanning t₃ through t₄, the portion of the time range spanning t₃through t₄ is temporally adjacent to the portion of query time range 815spanning t₄ through t₅, and the portion of the time range spanning t₄through t₅ is temporally adjacent to the portion of query time range 815spanning t₅ through t_(b).

In some embodiments, for each temporally adjacent portion of the timerange, schema determiner 630 selects a time series database schema ofthe multiple time series database schemas on which to execute the query.Since there are multiple time series database schema available for eachtemporally adjacent portion of the time range, schema determiner 630selects one of the time series database schema upon which the query willbe run. For example, for the portion of the query time range 815spanning t_(a) through t₂ schema determiner 630 selects one of schema820 and schema 840 upon which the query will be run, for the portion ofthe query time range 815 spanning t₂ through t₃ schema determiner 630selects one of schema 820 and schema 850 upon which the query will berun, for the portion of the query time range 815 spanning t₃ through t₄schema determiner 630 selects one of schema 810 and schema 850 uponwhich the query will be run, for the portion of the query time range 815spanning t₄ through t₅ schema determiner 630 selects one of schema 810and schema 860 upon which the query will be run, and for the portion ofthe query time range 815 spanning t₅ through t₆ schema determiner 630selects one of schema 810 and schema 870 upon which the query will berun.

In other embodiments, the query can be run against all available schemasfor each temporally adjacent portion of the time range, and the planexecutor deduplicates the results upon execution of the sub-queries.

With reference again to FIG. 6A, schema determiner 630 forwards theselection of the time series database schemas applicable over the timerange of the query to query divider 640, along with the query.

Query divider 640 divides the query into five sub-queries, the firstsub-query spanning to through t₂ and corresponding to one of time seriesdatabase schemas 820 and 840, the second sub-query spanning t₂ throught₃ and corresponding to one of time series database schemas 820 and 850,the third sub-query spanning t₃ through t₄ and corresponding to one oftime series database schemas 810 and 850, the fourth sub-query spanningt₄ through t₅ and corresponding to one of time series database schemas810 and 860, and the fifth sub-query spanning t₅ through t_(b) andcorresponding to one of time series database schemas 810 and 870.

Query divider 640 forwards the sub-queries to plan generator 650. Plangenerator 650 generates a query plan 660 for determining the data toretrieve from time series database 130 based on the sub-queries. Queryplan executor 670 receives query plan 660 and executes the fivesub-queries defined in query plan 660. The five sub-query results 675are joined at sub-query result joiner 680, generating query results 685that includes the five sub-query results 675.

Hence, the embodiments of the present invention greatly extend beyondconventional methods of storing time series data in a time seriesdatabase of a time series data monitoring system. For instance,ingesting and storing time series data according to multiple time seriesdatabase schemes provides improved partitioning of the time series datafor improving performance of querying of the data by allowing selectionof a time series database schema that provides improved performance. Forinstance, upon query, fewer scans to the database are needed, and theyield of the performed scans is increased, providing improved results toqueries, reducing the number of I/O operations, thereby improvingprocessing and throughput. By providing the storage of time series dataaccording to multiple time series database schema, the describedembodiments reduce the impact on processing and throughput of queries oftime series data.

Accordingly, embodiments of the present invention amount tosignificantly more than merely using a computer to store time seriesdata in a time series database of a time series data monitoring system.Instead, embodiments of the present invention specifically recite anovel process, rooted in computer technology, for storing time seriesdata according to multiple time series database schemas, improvingperformance of query processing in a time series data monitoring system.

FIG. 9 is a block diagram of an example computer system 900 upon whichembodiments of the present invention can be implemented. FIG. 9illustrates one example of a type of computer system 900 (e.g., acomputer system) that can be used in accordance with or to implementvarious embodiments which are discussed herein.

It is appreciated that computer system 900 of FIG. 9 is only an exampleand that embodiments as described herein can operate on or within anumber of different computer systems including, but not limited to,general purpose networked computer systems, embedded computer systems,mobile electronic devices, smart phones, server devices, client devices,various intermediate devices/nodes, standalone computer systems, mediacenters, handheld computer systems, multi-media devices, and the like.In some embodiments, computer system 900 of FIG. 9 is well adapted tohaving peripheral tangible computer-readable storage media 902 such as,for example, an electronic flash memory data storage device, a floppydisc, a compact disc, digital versatile disc, other disc based storage,universal serial bus “thumb” drive, removable memory card, and the likecoupled thereto. The tangible computer-readable storage media isnon-transitory in nature.

Computer system 900 of FIG. 9 includes an address/data bus 904 forcommunicating information, and a processor 906A coupled with bus 904 forprocessing information and instructions. As depicted in FIG. 9 ,computer system 900 is also well suited to a multi-processor environmentin which a plurality of processors 906A, 906B, and 906C are present.Conversely, computer system 900 is also well suited to having a singleprocessor such as, for example, processor 906A. Processors 906A, 906B,and 906C may be any of various types of microprocessors. Computer system900 also includes data storage features such as a computer usablevolatile memory 908, e.g., random access memory (RAM), coupled with bus904 for storing information and instructions for processors 906A, 906B,and 906C. Computer system 900 also includes computer usable non-volatilememory 910, e.g., read only memory (ROM), coupled with bus 904 forstoring static information and instructions for processors 906A, 906B,and 906C. Also present in computer system 900 is a data storage unit 912(e.g., a magnetic or optical disc and disc drive) coupled with bus 904for storing information and instructions. Computer system 900 alsoincludes an alphanumeric input device 914 including alphanumeric andfunction keys coupled with bus 904 for communicating information andcommand selections to processor 906A or processors 906A, 906B, and 906C.Computer system 900 also includes a cursor control device 916 coupledwith bus 904 for communicating user input information and commandselections to processor 906A or processors 906A, 906B, and 906C. In oneembodiment, computer system 900 also includes a display device 918coupled with bus 904 for displaying information.

Referring still to FIG. 9 , display device 918 of FIG. 9 may be a liquidcrystal device (LCD), light emitting diode display (LED) device, cathoderay tube (CRT), plasma display device, a touch screen device, or otherdisplay device suitable for creating graphic images and alphanumericcharacters recognizable to a user. Cursor control device 916 allows thecomputer user to dynamically signal the movement of a visible symbol(cursor) on a display screen of display device 918 and indicate userselections of selectable items displayed on display device 918. Manyimplementations of cursor control device 916 are known in the artincluding a trackball, mouse, touch pad, touch screen, joystick orspecial keys on alphanumeric input device 914 capable of signalingmovement of a given direction or manner of displacement. Alternatively,it will be appreciated that a cursor can be directed and/or activatedvia input from alphanumeric input device 914 using special keys and keysequence commands. Computer system 900 is also well suited to having acursor directed by other means such as, for example, voice commands. Invarious embodiments, alphanumeric input device 914, cursor controldevice 916, and display device 918, or any combination thereof (e.g.,user interface selection devices), may collectively operate to provide agraphical user interface (GUI) 930 under the direction of a processor(e.g., processor 906A or processors 906A, 906B, and 906C). GUI 930allows user to interact with computer system 900 through graphicalrepresentations presented on display device 918 by interacting withalphanumeric input device 914 and/or cursor control device 916.

Computer system 900 also includes an I/O device 920 for couplingcomputer system 900 with external entities. For example, in oneembodiment, I/O device 920 is a modem for enabling wired or wirelesscommunications between computer system 900 and an external network suchas, but not limited to, the Internet. In one embodiment, I/O device 920includes a transmitter. Computer system 900 may communicate with anetwork by transmitting data via I/O device 920.

Referring still to FIG. 9 , various other components are depicted forcomputer system 900. Specifically, when present, an operating system922, applications 924, modules 926, and data 928 are shown as typicallyresiding in one or some combination of computer usable volatile memory908 (e.g., RAM), computer usable non-volatile memory 910 (e.g., ROM),and data storage unit 912. In some embodiments, all or portions ofvarious embodiments described herein are stored, for example, as anapplication 924 and/or module 926 in memory locations within RAM 908,computer-readable storage media within data storage unit 912, peripheralcomputer-readable storage media 902, and/or other tangiblecomputer-readable storage media.

Example Methods of Operation

The following discussion sets forth in detail the operation of someexample methods of operation of embodiments. With reference to FIGS. 10through 16 , flow diagrams 1000, 1100, 1200, 1300, 1400, 1500, and 1600illustrate example procedures used by various embodiments. The flowdiagrams 1000, 1100, 1200, 1300, 1400, 1500, and 1600 include someprocedures that, in various embodiments, are carried out by a processorunder the control of computer-readable and computer-executableinstructions. In this fashion, procedures described herein and inconjunction with the flow diagrams are, or may be, implemented using acomputer, in various embodiments. The computer-readable andcomputer-executable instructions can reside in any tangible computerreadable storage media. Some non-limiting examples of tangible computerreadable storage media include random access memory, read only memory,magnetic disks, solid state drives/“disks,” and optical disks, any orall of which may be employed with computer environments (e.g., computersystem 900). The computer-readable and computer-executable instructions,which reside on tangible computer readable storage media, are used tocontrol or operate in conjunction with, for example, one or somecombination of processors of the computer environments and/orvirtualized environment. It is appreciated that the processor(s) may bephysical or virtual or some combination (it should also be appreciatedthat a virtual processor is implemented on physical hardware). Althoughspecific procedures are disclosed in the flow diagram, such proceduresare examples. That is, embodiments are well suited to performing variousother procedures or variations of the procedures recited in the flowdiagram. Likewise, in some embodiments, the procedures in flow diagrams1000, 1100, 1200, 1300, 1400, 1500, and 1600 may be performed in anorder different than presented and/or not all of the proceduresdescribed in flow diagrams 1000, 1100, 1200, 1300, 1400, 1500, and 1600may be performed. It is further appreciated that procedures described inflow diagrams 1000, 1100, 1200, 1300, 1400, 1500, and 1600 may beimplemented in hardware, or a combination of hardware with firmwareand/or software provided by computer system 900.

FIG. 10 depicts a flow diagram 1000 for adapting time series databaseschema based on received data points, according to an embodiment. Atprocedure 1010 of flow diagram 1000, a time series data ingested into atime series database according to a time series database schema over atime period is accessed, wherein time series data comprises a pluralityof dimensions. In one embodiment, the time series database schemaincludes a plurality of shards, each shard corresponding to a dimensionof the plurality of dimensions.

At procedure 1020, the time series data of the time period is analyzedto determine a data shape of the time series data of the time period. Inone embodiment, as shown at procedure 1022, the time series data of thetime period is analyzed to determine at least a partitionability ofdimensions of the plurality of dimensions.

At procedure 1030, it is determined whether to adapt the time seriesdatabase schema based at least in part on the data shape of the timeseries data of the time period. In one embodiment, as shown at procedure1032, the determination whether to adapt the plurality of shards of thetime series database schema is based at least in part on thepartitionability of dimensions of the plurality of dimensions. If it isdetermined not to adapt the time series database schema, no action istaken, as shown at procedure 1035. In one embodiment, flow diagram 1000returns to procedure 1010.

In one embodiment, as shown at procedure 1040, the time series databaseschema is adapted based at least in part on the data shape of the timeseries data of the time period. In one embodiment, as shown at procedure1042, the sharding parameters of the time series database schema areadapted. In one embodiment, a shard corresponding to a dimension of theplurality of dimensions is added to the time series database schema. Inone embodiment, a shard corresponding to a dimension of the plurality ofdimensions is removed from the time series database schema. In oneembodiment, flow diagram 1000 returns to procedure 1010.

In one embodiment, as shown at procedure 1050, time series data isingested into the time series database according to the time seriesdatabase schema. In one embodiment, procedure 1050 is performedaccording to flow diagram 1100 of FIG. 11 .

FIG. 11 depicts a flow diagram 1100 for ingesting time series data intothe time series database, according to an embodiment. At procedure 1110of flow diagram 1100, a timestamp of a data point of the time seriesdata being ingested into the time series database is identified. Atprocedure 1120, the time series database schema corresponding to thetimestamp is identified. At procedure 1130, the data point is ingestedinto the time series database according to the time series databaseschema corresponding to the timestamp.

FIG. 12 depicts a flow diagram 1200 for adapting time series databaseschema based on analysis of received queries, according to anembodiment. At procedure 1210 of flow diagram 1200, a plurality ofqueries to a time series database received over a time period areaccessed. The time series data is ingested into the time series databaseaccording to a time series database schema, wherein time series datacomprises a plurality of dimensions. In one embodiment, the time seriesdatabase schema includes a plurality of shards, each shard correspondingto a dimension of the plurality of dimensions.

At procedure 1220, the plurality of queries of the time period areanalyzed to determine a relative frequency of the plurality ofdimensions within the plurality of queries over the time period. In oneembodiment, as shown at procedure 1222, for the plurality of dimensions,a number of times each dimension is a predicate comprised within theplurality of queries over the time period is determined. At procedure1224, the plurality of dimensions are ranked according to the number oftimes each dimension is a predicate comprised within the plurality ofqueries over the time period to generate a dimension frequency orderlist.

At procedure 1230, it is determined whether to adapt the time seriesdatabase schema based at least in part on the relative frequency of theplurality of dimensions within the plurality of queries over the timeperiod. In one embodiment, the determination whether to adapt theplurality of shards of the time series database schema is based at leastin part on whether the plurality of shards corresponds to at least onedimension ranked high within the dimension frequency order list.Provided the plurality of shards do not correspond to at least onedimension ranked high within the dimension frequency order list, it isdetermined to adapt the time series database schema. If it is determinednot to adapt the time series database schema, no action is taken, asshown at procedure 1235. In one embodiment, flow diagram 1200 returns toprocedure 1210.

In one embodiment, procedure 1230 is performed according to flow diagram1300 of FIG. 13A. At procedure 1310 of flow diagram 1300, it isdetermined that a data shape of time series data ingested into the timeseries database is indicative of an adaptation of the time seriesdatabase schema by adding a shard corresponding to a particulardimension. At procedure 1320, provided the particular dimension has ahigh relative frequency of the plurality of dimensions within theplurality of queries over the time period, it is determined to adapt thetime series database schema to add the shard corresponding to theparticular dimension. At procedure 1330, provided the particulardimension has a low relative frequency of the plurality of dimensionswithin the plurality of queries over the time period, it is determinedto not adapt the time series database schema.

In another embodiment, procedure 1230 is performed according to flowdiagram 1350 of FIG. 13B. At procedure 1360 of flow diagram 1350, it isdetermined that a data shape of time series data ingested into the timeseries database is indicative of an adaptation of the time seriesdatabase schema by removing a shard corresponding to a particulardimension. At procedure 1370, provided the particular dimension has alow relative frequency of the plurality of dimensions within theplurality of queries over the time period, it is determined to adapt thetime series database schema to remove the shard corresponding to theparticular dimension. At procedure 1380, provided the particulardimension has a high relative frequency of the plurality of dimensionswithin the plurality of queries over the time period, it is determinedto not adapt the time series database schema.

In one embodiment, as shown at procedure 1240, the time series databaseschema is adapted based at least in part on the relative frequency ofthe plurality of dimensions within the plurality of queries over thetime period. In one embodiment, as shown at procedure 1242, the shardingparameters of the time series database schema are adapted. In oneembodiment, a shard corresponding to a dimension of the plurality ofdimensions having a high relative frequency within the plurality ofqueries over the time period is added to the time series databaseschema. In one embodiment, a shard corresponding to a dimension of theplurality of dimensions having a low relative frequency within theplurality of queries over the time period is removed from the timeseries database schema. In one embodiment, flow diagram 1200 returns toprocedure 1210.

In one embodiment, as shown at procedure 1250, time series data isingested into the time series database according to the time seriesdatabase schema. In one embodiment, procedure 1250 is performedaccording to flow diagram 1100 of FIG. 11 .

FIG. 14 depicts a flow diagram 1400 for maintaining a time seriesdatabase including a plurality of time series database schemas,according to various embodiments. At procedure 1410 of flow diagram1400, time series data including data points is received at an ingestionnode of a system for maintaining a time series database, where the datapoints include a plurality of dimensions.

At procedure 1420, a plurality of time series database schemas of thetime series database is determined for storing the time series data. Insome embodiments, each time series database schema of the plurality oftime series database schemas includes a plurality of shards, each shardcorresponding to a dimension of the plurality of dimensions.

At procedure 1430, the time series data is ingested according to theplurality of time series database schemas, wherein each data point isstored according to each time series database schema of the plurality oftime series database schemas, such that the time series databasecomprises multiple instances of each data point.

FIG. 15 depicts a flow diagram 1500 for querying a variably partitionedtime series database, according to various embodiments. At procedure1510 of flow diagram 1500, a query of a time series database isreceived, the query including a time range and a predicate comprising atleast one dimension, wherein the time series database includes aplurality of time series database schemas. In one embodiment, the timeseries database includes a single time series database schema for eachtime instance. In another embodiment, the time series database includesmultiple time series database schemas for at least one time instance.

At procedure 1520, at least one time series database schema of the timeseries database corresponding to the time range is determined. In oneembodiment, where the time series database includes multiple time seriesdatabase schemas for at least one time instance, procedure 1520 isperformed according to flow diagram 1600 of FIG. 16 . At procedure 1610of flow diagram 1600, it is determined which of the multiple time seriesdatabase schemas correspond to the time range. At procedure 1620,temporally adjacent portions of the time range for which the multipletime series database schemas are constant are determined. At procedure1630, a time series database schema of the multiple time series databaseschemas on which to execute the query is selected for each temporallyadjacent portion of the time range.

In one embodiment, as shown at procedure 1632, the time series databaseschema of the multiple time series database schemas for each temporallyadjacent portion that provides a highest yield is selected. In anotherembodiment, as shown at procedure 1634, the time series database schemaof the multiple time series database schemas according to the at leastone dimension of the query is selected.

At procedure 1530, the query is divided into a plurality of sub-queries,wherein each sub-query of the plurality of sub-queries corresponds toone time series database schema of the plurality of time series databaseschemas. In one embodiment, as shown at procedure 1532, the query isdivided into a first sub-query and a second sub-query, wherein the firstsub-query corresponds to a first time series database schema of theplurality of time series database schemas and the predicate comprises afirst dimension, and wherein the second sub-query corresponds to asecond time series database schema of the plurality of time seriesdatabase schemas and the predicate comprises a second dimensiondifferent than the first dimension. In another embodiment, as shown atprocedure 1534, the query is divided into the plurality of sub-queries,each sub-query of the plurality of sub-queries corresponding to onetemporally adjacent portion of the time range and one time seriesdatabase schema.

At procedure 1540, the plurality of sub-queries is executed to return aplurality of results. In one embodiment, as shown at procedure 1550, theplurality of results are joined into a combined result.

It is noted that any of the procedures, stated above, regarding the flowdiagrams of FIGS. 10 through 16 may be implemented in hardware, or acombination of hardware with firmware and/or software. For example, anyof the procedures are implemented by a processor(s) of a cloudenvironment and/or a computing environment.

One or more embodiments of the present invention may be implemented asone or more computer programs or as one or more computer program modulesembodied in one or more computer readable media. The term computerreadable medium refers to any data storage device that can store datawhich can thereafter be input to a computer system—computer readablemedia may be based on any existing or subsequently developed technologyfor embodying computer programs in a manner that enables them to be readby a computer. Examples of a computer readable medium include a harddrive, network attached storage (NAS), read-only memory, random-accessmemory (e.g., a flash memory device), a CD (Compact Discs)-CD-ROM, aCD-R, or a CD-RW, a DVD (Digital Versatile Disc), a magnetic tape, andother optical and non-optical data storage devices. The computerreadable medium can also be distributed over a network coupled computersystem so that the computer readable code is stored and executed in adistributed fashion.

Although one or more embodiments of the present invention have beendescribed in some detail for clarity of understanding, it will beapparent that certain changes and modifications may be made within thescope of the claims. Accordingly, the described embodiments are to beconsidered as illustrative and not restrictive, and the scope of theclaims is not to be limited to details given herein, but may be modifiedwithin the scope and equivalents of the claims. In the claims, elementsand/or steps do not imply any particular order of operation, unlessexplicitly stated in the claims.

Many variations, modifications, additions, and improvements arepossible, regardless the degree of virtualization. Plural instances maybe provided for components, operations or structures described herein asa single instance. Finally, boundaries between various components,operations and data stores are somewhat arbitrary, and particularoperations are illustrated in the context of specific illustrativeconfigurations. Other allocations of functionality are envisioned andmay fall within the scope of the invention(s). In general, structuresand functionality presented as separate components in exemplaryconfigurations may be implemented as a combined structure or component.Similarly, structures and functionality presented as a single componentmay be implemented as separate components. These and other variations,modifications, additions, and improvements may fall within the scope ofthe appended claims(s).

1. A method for operating on a time series database comprising aplurality of time series database schemas, the method comprising:receiving a query of a time series database, wherein the time seriesdatabase comprises a plurality of time series database schemas, andwherein each received data point is stored according to each time seriesdatabase schema of the plurality of time series database schemas, suchthat the time series database comprises multiple instances of each datapoint; and generating a query plan according to the query and theplurality of time series database schemas corresponding to a time range,wherein the generating the query plan according to the query and theplurality of time series database schemas corresponding to the timerange comprises: determining temporally adjacent portions of the timerange for which the plurality of time series database schemas areconstant; for each temporally adjacent portion of the time range,selecting a time series database schema of the plurality of time seriesdatabase schemas on which to execute the query; and dividing the queryinto a plurality of sub-queries, wherein each sub-query of the pluralityof sub-queries corresponds to one time series database schema of theplurality of time series database schemas.
 2. The method of claim 1,wherein each time series database schema of the plurality of time seriesdatabase schemas comprises a plurality of shards, each shardcorresponding to a dimension of a plurality of dimensions.
 3. The methodof claim 1, further comprising: receiving the query of the time seriesdatabase, the query comprising the time range and a predicate comprisingat least one dimension; and determining which of the plurality of timeseries database schemas correspond to the time range.
 4. The method ofclaim 3, further comprising executing the query to return results. 5.The method of claim 3, wherein the selecting a time series databaseschema of the plurality of time series database schemas on which toexecute the query comprises: selecting the time series database schemaof the plurality of time series database schemas for each temporallyadjacent portion that provides a highest yield.
 6. The method of claim3, wherein the selecting a time series database schema of the pluralityof time series database schemas on which to execute the query comprises:selecting the time series database schema of the plurality of timeseries database schemas according to the at least one dimension of thequery.
 7. The method of claim 3, wherein the generating the query planaccording to the query and the plurality of time series database schemascorresponding to the time range comprises: determining temporallyadjacent portions of the time range for which the plurality of timeseries database schemas are constant; and dividing the query into aplurality of sub-queries, wherein each sub-query of the plurality ofsub-queries corresponds to each time series database schema of theplurality of time series database schemas for each temporally adjacentportion of the time range.
 8. A non-transitory computer readable storagemedium having computer readable program code stored thereon for causinga computer system to perform a method for operating on a time seriesdatabase comprising a plurality of time series database schemas, themethod comprising: receiving a query of a time series database, whereinthe time series database comprises a plurality of time series databaseschemas, and wherein each received data point is stored according toeach time series database schema of the plurality of time seriesdatabase schemas, such that the time series database comprises multipleinstances of each data point; and generating a query plan according tothe query and the plurality of time series database schemascorresponding to a time range, wherein the generating the query planaccording to the query and the plurality of time series database schemascorresponding to the time range comprises: determining temporallyadjacent portions of the time range for which the plurality of timeseries database schemas are constant; for each temporally adjacentportion of the time range, selecting a time series database schema ofthe plurality of time series database schemas on which to execute thequery; and dividing the query into a plurality of sub-queries, whereineach sub-query of the plurality of sub-queries corresponds to one timeseries database schema of the plurality of time series database schemas.9. The non-transitory computer readable storage medium of claim 8,wherein each time series database schema of the plurality of time seriesdatabase schemas comprises a plurality of shards, each shardcorresponding to a dimension of a plurality of dimensions.
 10. Thenon-transitory computer readable storage medium of claim 8, the methodfurther comprising: receiving the query of the time series database, thequery comprising the time range and a predicate comprising at least onedimension; and determining which of the plurality of time seriesdatabase schemas correspond to the time range.
 11. The non-transitorycomputer readable storage medium of claim 10, the method furthercomprising: executing the query to return results.
 12. Thenon-transitory computer readable storage medium of claim 10, wherein theselecting a time series database schema of the plurality of time seriesdatabase schemas on which to execute the query comprises: selecting thetime series database schema of the plurality of time series databaseschemas for each temporally adjacent portion that provides a highestyield.
 13. The non-transitory computer readable storage medium of claim10, wherein the selecting a time series database schema of the pluralityof time series database schemas on which to execute the query comprises:selecting the time series database schema of the plurality of timeseries database schemas according to the at least one dimension of thequery.
 14. The non-transitory computer readable storage medium of claim10, wherein the generating the query plan according to the query and theplurality of time series database schemas corresponding to the timerange comprises: determining temporally adjacent portions of the timerange for which the plurality of time series database schemas areconstant; and dividing the query into a plurality of sub-queries,wherein each sub-query of the plurality of sub-queries corresponds toeach time series database schema of the plurality of time seriesdatabase schemas for each temporally adjacent portion of the time range.15. A system for operating on a time series database comprising aplurality of time series database schemas, the system comprising: aplurality of query nodes, each query node of the plurality of querynodes comprising a data storage unit and a processor communicativelycoupled with the data storage unit, wherein a query node of theplurality of query nodes is configured to: receive a query of a timeseries database, wherein the time series database comprises a pluralityof time series database schemas, and wherein each received data point isstored according to each time series database schema of the plurality oftime series database schemas, such that the time series databasecomprises multiple instances of each data point; and generate a queryplan according to the query and the plurality of time series databaseschemas corresponding to a time range; determine temporally adjacentportions of the time range for which the plurality of time seriesdatabase schemas are constant; select a time series database schema ofthe plurality of time series database schemas on which to execute thequery for each temporally adjacent portion of the time range; and dividethe query into a plurality of sub-queries, wherein each sub-query of theplurality of sub-queries corresponds to one time series database schemaof the plurality of time series database schemas.
 16. The system ofclaim 15, wherein each time series database schema of the plurality oftime series database schemas comprises a plurality of shards, each shardcorresponding to a dimension of a plurality of dimensions.
 17. Thesystem of claim 15, wherein the query node of the plurality of querynodes is configured to: receive the query of the time series database,the query comprising the time range and a predicate comprising at leastone dimension; determine which of the plurality of time series databaseschemas correspond to the time range; and execute the query to returnresults.
 18. The system of claim 17, wherein the query node of theplurality of query nodes is configured to: select the time seriesdatabase schema of the plurality of time series database schemas foreach temporally adjacent portion that provides a highest yield.
 19. Thesystem of claim 17, wherein the query node of the plurality of querynodes is configured to: select the time series database schema of theplurality of time series database schemas according to the at least onedimension of the query.
 20. The system of claim 17, wherein the querynode of the plurality of query nodes is configured to: determinetemporally adjacent portions of the time range for which the pluralityof time series database schemas are constant; and divide the query intoa plurality of sub-queries, wherein each sub-query of the plurality ofsub-queries corresponds to each time series database schema of theplurality of time series database schemas for each temporally adjacentportion of the time range.